Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for .mem in same directory as Node script #4942

Closed
wants to merge 2 commits into from

Conversation

RReverser
Copy link
Collaborator

@RReverser RReverser commented Feb 13, 2017

This seems to be a common source of issues when trying to run optimised code with separate .js.mem from Node.js, while not being in the same working directory as script itself.

It was possible to workaround this and set search path using memoryInitializerPrefixURL since 7e9e24, but I think it makes sense to have a nice default too, which would be to search for .js.mem file in the same directory as .js itself when running using Node.js.

Added regression test for this behaviour, which also revealed that SpiderMonkey / V8 shells already got this right, so this PR aligns Node.js behaviour to others.

(Note: this is reopening of #4936, cc @juj)

…ed code with separate .js.mem from Node.js, while not being in the same working directory.

It was possible to workaround this and set search path using memoryInitializerPrefixURL since 7e9e24, but I think it makes sense to have a nice default too, which would be to search for .js.mem file in the same directory as .js itself when running using Node.js.

Added regression test for this behaviour, which also revealed that SpiderMonkey / V8 shells already got this right, so this PR aligns Node.js behaviour to others.
@RReverser RReverser changed the title This seems to be a common source of issues when trying to run optimis… Search for .mem in same directory as Node script Feb 13, 2017
@kripken
Copy link
Member

kripken commented Feb 13, 2017

A general question, why is node.js special here? Is it just because we have an easy way to get the dir of the .js file?

@RReverser
Copy link
Collaborator Author

RReverser commented Feb 13, 2017

@kripken As I said above, pure JS shells (V8 / SpiderMonkey) already do the right thing:

which also revealed that SpiderMonkey / V8 shells already got this right, so this PR aligns Node.js behaviour to others

That is, read in those engines looks up in script's directory, not current working directory, but Node's fs.* funcs look in the working dir, so to fix it and align behavior, we need to pass an absolute path, and __dirname does just that.

Theoretically we could implement something similar for browsers too, but there 1) document.currentScript is not too widespread, 2) it's polyfills are too error-prone to rely upon and 3) it's easier to override Module.memoryInitializerPrefixURL for the browser before execution of compiled script.

@kripken
Copy link
Member

kripken commented Feb 14, 2017

Thanks, now I see.

src/postamble.js Outdated
@@ -43,6 +43,8 @@ if (memoryInitializer) {
memoryInitializer = Module['locateFile'](memoryInitializer);
} else if (Module['memoryInitializerPrefixURL']) {
memoryInitializer = Module['memoryInitializerPrefixURL'] + memoryInitializer;
} else if (ENVIRONMENT_IS_NODE) {
memoryInitializer = __dirname + '/' + memoryInitializer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment here saying what this does

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, somehow missed this - added a comment.

@kripken
Copy link
Member

kripken commented Feb 14, 2017

This may break existing users, I think? People that had the mem file in the current dir but the js in another?

@RReverser
Copy link
Collaborator Author

@kripken Do you think it's a common case when they had files in different directories? Given that emcc generates them in the same directory, the only way that would happen is if they moved .mem file manually after the build, but that seems pretty unlikely case.

@kripken
Copy link
Member

kripken commented Feb 17, 2017

Yeah, I agree it's very unlikely. Possibly also something we should have recommended people not do anyhow :) But I'm not 100% sure, and I worry about annoying users. Perhaps @juj has a sense of the risk here?

One possibility to reduce the risk is look in both places, but it seems like that isn't easy to implement in the code, so maybe not worth the complexity?

@RReverser
Copy link
Collaborator Author

@kripken I've added a comment as per request. Anything else missing / what are the generic thoughts on whether this can be moved forward?

@kripken
Copy link
Member

kripken commented Mar 8, 2017

What's left is to assess the risk of breakage, I think. @juj: maybe you have time to give your thoughts now, after GDC?

Another possible thing to reduce the risk here is to improve the error message in the breakage case (i.e., when we fail to find the mem file, say on node "it should be in the same dir as the js file")?

@RReverser
Copy link
Collaborator Author

Another possible thing to reduce the risk here is to improve the error message in the breakage case (i.e., when we fail to find the mem file, say on node "it should be in the same dir as the js file")?

Sounds reasonable to me.

@Deamon87
Copy link
Contributor

I have some thoughts about this code...

This will add another node-js dependency into resulting js file. Atm emscripten already loads "fs" "dir" module, which makes webpack "unhappy". And to make webpack accept current code, I had to preamble module file with "require=function(){}" line.

For the issue itself, emscripten result can be preambled with

var Module = {};
Module['locateFile'] = function (filePath) { /* Some user's code here */ }

This allows users to set path to file themselves and put ".mem" file anywhere they want.

Same for browsers

var Module = {};
Module['readAsync'] = function (url, success, error) { /* Some user's code here */ }

Using this method for overriding function I was able to load .mem file using axios lib and custom url.

@RReverser
Copy link
Collaborator Author

RReverser commented Mar 10, 2017

@Deamon87 It can be worked around, yes (I mentioned this in issue description), but the underlying issue in inconsistency of file search between engines still exists and looking in the same folder is sane default to have, as this is what most users expect.

As for webpack - it's very flexible and you should configure it to use aliases or ignore specific modules if your issue is that they are accidentally included; preambling module with custom require() is not really a good option as it might break WebPack expectations cc @TheLarkInn.

Anyway, I just don't think having problems with webpack configuration should be a blocker for fixing actual Node.js behavior.

@RReverser
Copy link
Collaborator Author

RReverser commented Mar 10, 2017

And btw, this change doesn't break webpack either as it knows what __dirname is and transforms it to / or what you have configured in publicPath (even though that branch won't be executed in browser anyway).

@Deamon87
Copy link
Contributor

@RReverser
Ty. You gave me some interesting ideas didnt know of :)

@RReverser
Copy link
Collaborator Author

@kripken @juj Friendly ping on this :)

@@ -43,6 +43,9 @@ if (memoryInitializer) {
memoryInitializer = Module['locateFile'](memoryInitializer);
} else if (Module['memoryInitializerPrefixURL']) {
memoryInitializer = Module['memoryInitializerPrefixURL'] + memoryInitializer;
} else if (ENVIRONMENT_IS_NODE) {
// Look for .mem file in the same folder as script by default
memoryInitializer = __dirname + '/' + memoryInitializer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions do come to mind:

  • does node.js have a path join function that could be used here? I.e. is it guaranteed that __dirname does not end in a '/' so that this won't do a /path/to/foo//foo.mem? On Windows, should this be a backslash '\' or is node.js satisfied or even expect '/' on Windows as well?
  • are other file loads with node.js affected similarly? e.g. why is .mem file special here for node.js?

Last, I wonder if this would be better to read

if (typeof Module['locateFile'] === 'function') {
  memoryInitializer = Module['locateFile'](memoryInitializer);
} else if (Module['memoryInitializerPrefixURL']) {
  memoryInitializer = Module['memoryInitializerPrefixURL'] + memoryInitializer;
}

+ if (ENVIRONMENT_IS_NODE && isThisARelativeUrl(memoryInitializer)) {
+   // Relative paths should be treated relative to the folder where the .js script resides in
+   memoryInitializer = __dirname + '/' + memoryInitializer;
+ }

since Module['locateFile'] and Module['memoryInitializerPrefixURL'] may also be relative - do we ever want to load files relative to the cwd in node.js?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively to make it more generic, perhaps we could have a Module['documentDirectory'] which would be initialized to __dirname on node.js by default, and the above type of code would always read relative directories with respect to Module['documentDirectory'] if that exists. That way we'd avoid having to put in node.js specifics at each place of use (assuming we will have more of them in the future, e.g. when node.js adds .wasm support?), and if we need a similar thing for SpiderMonkey or other shells, it would be decoupled as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. is it guaranteed that __dirname does not end in a '/' so that this won't do a /path/to/foo//foo.mem?

Yeah, it is.

On Windows, should this be a backslash '' or is node.js satisfied or even expect '/' on Windows as well?

Windows itself doesn't mind forward slashes and accepts either, so I wouldn't bother.

e.g. why is .mem file special here for node.js?

.mem is special only because Emscripten 1) generates it alongside the .js but 2) searches in current working dir and not alongside .js, which causes app to immediately crash if they don't match (e.g. when invoked by Cargo or other builder from top-level working directory). For regular file reads by the app itself, I think it's reasonable to leave behaviour as-is because they're not generated alongside .js in the first place.

Module['memoryInitializerPrefixURL'] may also be relative - do we ever want to load files relative to the cwd in node.js

Yeah, as said above, I think for normal cases we want to load files relative to cwd, so that program would behave in the same way as when compiled natively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I don't understand the idea for Module['documentDirectory'] - what does "document" mean here?

Perhaps we need something like Module['pwd']?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken Overriding current working directory sounds irrelevant to this issue though? We want cwd to be still the same (at least where it exists for real, like in Node.js), it's only .mem that needs to be fixed to be found alongside .js as I explained above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@curiousdannii I already commented about the browser situations above couple of times - TL;DR it's quite different because browser doesn't have a notion of "pwd" unlike all the console shells, and not having a solution for everything is not a reason not to fix at least those we can. Anyway, this is blocked due to the other reasons now.

The idea to add this path name is a good solution. As commented earlier, I think it would be good to separate the location of initialization (platform specific) from the location of use (platform independent), so that we don't need to sprinkle if (ENVIRONMENT_IS_NODE) x = __dirname + x; statements to wherever we need to load items relative to the main .js document in question. I understand there is only one such place where we are fixing this for now, but as we discussed above, there will likely be more of such fixes in the future (.data, .wasm?), so having something like

    // in preamble:
    if (!Module['documentDirectory']) {
        if (ENVIRONMENT_IS_NODE) Module['documentDirectory'] = __dirname + '/';
    }

    // in location(s) of use:
    memoryInitializer = Module['documentDirectory'] + memoryInitializer;

This way when we expand in the future to having multiple locations that need this fix, we can just slap the Module['documentDirectory'] + bit to them without having to worry about node.js specifics in such locations of use. That will keep platform specifics down to that one line at init time. It's a small change to the current proposed form, but I think it would show a good form for future fixes. @kripken, did you have a thought about the name of such a variable?

Copy link
Collaborator Author

@RReverser RReverser Jun 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so that we don't need to sprinkle if (ENVIRONMENT_IS_NODE) x = __dirname + x; statements to wherever we need to load items relative to the main .js document in question

Oh of course, I was thinking about having a generic function rather than copy-pasting in the worst case.

Anyway, this idea sounds good to me, except the naming which might be confusing, especially on the browser side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about scriptDirectory? It should state pretty clearly that it should be a directory with generated script (and, consequently, other artifacts it depends upon).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module['scriptDirectory'] sounds good to me. @kripken ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me too.

@juj
Copy link
Collaborator

juj commented Apr 27, 2017

Maybe I don't understand the idea for Module['documentDirectory'] - what does "document" mean here?

In browser environment, the field document.URL can be used to obtain the URL to the current document, i.e. the .html file. (https://developer.mozilla.org/en-US/docs/Web/API/Document/URL, http://stackoverflow.com/questions/16984943/how-to-get-the-directory-part-of-current-url-in-javascript). The "document" lingo was reused from that concept, i.e. it could be used to refer to the directory where the current "document" resides in, so that it could get initialized in preamble.js by something like

if (!Module['documentDirectory']) {
  if (ENVIRONMENT_IS_BROWSER) Module['documentDirectory'] = document.URL.substr(0,document.URL.lastIndexOf('/'));
  else if (ENVIRONMENT_IS_NODE) Module['documentDirectory'] = __dirname;
}

then at the place of use, it could complete relative paths with respect to Module['documentDirectory']. That way the platform specifics will be contained to one place.

@RReverser
Copy link
Collaborator Author

@juj Please take another look at what I wrote above. Your proposed documentDirectory != the same as __dirname in Node.js, and generally you don't need to use document.URL because relative paths are already resolved relatively to the document in a browser.

I mentioned browser side above that there we could / should use document.currentScript to retrieve the location of the script itself (this is often not same location as a document), but it's not cross-browser, so I didn't include it in this PR.

documentDirectory, while potentially a useful addition, has nothing to do with this fix, as .mem should be search for alongside .js, and not in the current working directory (if in console) / directory of the document itself (if in browser).

@RReverser
Copy link
Collaborator Author

RReverser commented Apr 27, 2017

TL;DR - .mem is special because it's generated alongside .js and thus should be found alongside .js. It is already working correctly in other console engines, but not when using Node.js readFile - this PR fixes that. It has nothing to do with generic functions for finding / reading files, and they don't need to be changed - at least not in scope of this PR.

If you prefer not to fix Node.js behaviour here, I can close the PR, just want to make sure that we're not going forth and back on exact same questions.

@juj
Copy link
Collaborator

juj commented Apr 28, 2017

.mem is special because it's generated alongside .js and thus should be found alongside .js

Emscripten also generates other build outputs alongside .js in the same directory:
- if --separate-asm flag is passed, then a file out.asm.js is generated,
- if --preload-file flag is passed, then a out.data file is generated,
- if -s WASM=1 (-s BINARYEN=1) flag is passed, then a file out.wasm is generated,
- if -s USE_PTHREADS=1 is passed, then a file pthread-main.js is generated, and
- if -s FETCH=1 is passed, then a file fetch-worker.js is generated.

these are similar to how the .mem file, which is why I was asking for a general solution of "what's the directory where the main .js file resides". Although it is possible that none of these yet apply to Node.js (no SAB or Wasm at the moment), except perhaps the .data file? But at some point they will, so sharing the same solution with each would be good?

@RReverser
Copy link
Collaborator Author

RReverser commented Apr 28, 2017

Ah yeah in this sense .data should apply too. As for .js files - as long as you're using Node's require('./some'), it already searches alongside main .js, so they shouldn't be applicable. Not sure about .wasm yet, as its support in Node.js is currently under a flag only, so depends what the final implementation will look like.

@RReverser
Copy link
Collaborator Author

Regarding .wasm - at least at the moment, looks like it does have same issue #4542.

@saschanaz
Copy link
Collaborator

Can we fix Module['read'] itself? Shouldn't it internally use __dirname?

@RReverser
Copy link
Collaborator Author

@saschanaz No, see above. __dirname should be used only for script's dependencies that are generated alongside the script by the Emscripten itself, while changing Module['read'] would break reading from current working directory that most programs depend upon.

@RReverser
Copy link
Collaborator Author

@juj So what is the latest plan on this given the information above? Do you want to introduce some generic solution for these generated dependencies?

@curiousdannii
Copy link
Contributor

Another one is the Emterpreter data, which does currently apply to Node.

@mvasilkov
Copy link

In case anyone else finds this thread in search for a solution, this can be worked around in Node.js for an already-built module by using require-inject-scope. This allows you to inject the memoryInitializerPrefixURL field.

(My issue with this was, I had an Emscripten-built module, and needed to load it from another folder.)

@curiousdannii
Copy link
Contributor

I think this can be closed as #5368 has been merged.

@kripken
Copy link
Member

kripken commented Sep 17, 2018

I believe so, thanks @curiousdannii

@kripken kripken closed this Sep 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants